Hierarchies allow feature sharing between objects at multiple levels ofrepresentation, can code exponential variability in a very compact way andenable fast inference. This makes them potentially suitable for learning andrecognizing a higher number of object classes. However, the success of thehierarchical approaches so far has been hindered by the use of hand-craftedfeatures or predetermined grouping rules. This paper presents a novel frameworkfor learning a hierarchical compositional shape vocabulary for representingmultiple object classes. The approach takes simple contour fragments and learnstheir frequent spatial configurations. These are recursively combined intoincreasingly more complex and class-specific shape compositions, each exertinga high degree of shape variability. At the top-level of the vocabulary, thecompositions are sufficiently large and complex to represent the whole shapesof the objects. We learn the vocabulary layer after layer, by graduallyincreasing the size of the window of analysis and reducing the spatialresolution at which the shape configurations are learned. The lower layers arelearned jointly on images of all classes, whereas the higher layers of thevocabulary are learned incrementally, by presenting the algorithm with oneobject class after another. The experimental results show that the learnedmulti-class object representation scales favorably with the number of objectclasses and achieves a state-of-the-art detection performance at both, fasterinference as well as shorter training times.
展开▼